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OO ■ Abstract 

o : 

^ We propose a convex-concave programming approach for the labeled weighted graph matching problem. The 

. convex-concave programming formulation is obtained by rewriting the weighted graph matching problem as a 

least-square problem on the set of permutation matrices and relaxing it to two different optimization problems; 
O , a quadratic convex and a quadratic concave optimization problem on the set of doubly stochastic matrices. The 

■ concave relaxation has the same global minimum as the initial graph matching problem, but the search for its 

global minimum is also a hard combinatorial problem. We therefore construct an approximation of the concave 
problem solution by following a solution path of a convex-concave problem obtained by linear interpolation 
of the convex and concave formulations, starting from the convex relaxation. This method allows to easily 
integrate the information on graph label similarities into the optimization problem, and therefore to perform 
, labeled weighted graph matching. The algorithm is compared with some of the best performing graph matching 

methods on four datasets: simulated graphs, QAPLib, retina vessel images and handwritten Chinese characters. 
In all cases, the results are competitive with the state-of-the-art. 
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The graph matching problem is among the most important challenges of graph processing, and plays a central role 
in various fields of pattern recognition. Roughly speaking, the problem consists in finding a correspondence between 
vertices of two given graphs which is optimal in some sense. Usually, the optimality refers to the alignment of graph 
structures and, when available, of vertices labels, although other criteria arc possible as well. A non-exhaustive list 
I of graph matching applications includes document processing tasks like optical character recognition [1,2], image 
analysis (2D and 3D) [3-6], or bioinformatics [7-9]. 

During the last decades, many different algorithms for graph matching have been proposed. Because of the 
' combinatorial nature of this problem, it is very hard to solve it exactly for large graphs, however some methods 
^ I based on incomplete enumeration may be applied to search for an exact optimal solution in the case of small or 
sparse graphs. Some examples of such algorithms may be found in [10-12]. 

Another group of methods includes approximate algorithms which arc supposed to be more scalable. The price 
to pay for the scalability is that the solution found is usually only an approximation of the optimal matching. 
Approximate methods may be divided into two groups of algorithms. The first group is composed of methods 
which use spectral representations of adjacency matrices, or equivalently embed the vertices into a Euclidean space 
where linear or nonlinear matching algorithms can be deployed. This approach was pioneered by Umeyama [13], 
while further different methods based on spectral representations were proposed in [3-5,14,15]. The second group 
of approximate algorithms is composed of algorithms which work directly with graph adjacency matrices, and 
typically involve a relaxation of the discrete optimization problem. The most effective algorithms were proposed 
in [6,16-18]. 
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An interesting instance of the graph matching problem is the matching of labeled graphs. In that case, graph 
vertices have associated labels, which may be numbers, categorical variables, etc... The important point is that 
there is also a similarity measure between these labels. Therefore, when we search for the optimal correspondence 
between vertices, we search a correspondence which matches not only the structures of the graphs but also vertices 
with similar labels. Some widely used approaches for this application only use the information about similarities 
between graph labels. In vision, one such algorithm is the shape context algorithm proposed in [19], which involves 
an efficient algorithm of node label construction. Another example is the BLAST-based algorithms in bioinformatics 
such as the Inparanoid algorithm [20], where correspondence between different protein networks is established on 
the basis of BLAST scores between pairs of proteins. The main advantages of all these methods are their speed 
and simplicity. However, the main drawback of these methods is that they do not take into account information 
about the graph structure. Some graph matching methods try to combine information on graph structures and 
vertex similarities, examples of such method are presented in [7, 18]. 

In this article we propose an approximate method for labeled weighted graph matching, based on a convex- 
concave programming approach which can be applied for matching of graphs of large sizes. Our method is based 
on a formulation of the labeled weighted graph matching problem as a quadratic assignment problem (QAP) over 
the set of permutation matrices, where the quadratic term encodes the structural compatibility and the linear 
term encodes local compatibilities. We propose two relaxations of this problem, resulting in one quadratic convex 
and one quadratic concave minimization problem on the set of doubly stochastic matrices. While the concave 
relaxation has the same global minimum as the initial QAP, solving it is also a hard combinatorial problem. We 
find a local minimum of this problem by following a solution path of a family of convex-concave minimization 
problems, obtained by linearly interpolating between the convex and concave relaxations. Starting from the convex 
formulation with a unique local (and global) minimum, the solution path leads to a local optimum of the concave 
relaxation. We refer to this procedure as the PATH algorithnfl. We perform an extensive comparison of this PATH 
algorithm with several state-of-the-art matching methods on small simulated graphs and various QAP benchmarks, 
and show that it consistently provides state-of-the-art performances while scaling to graphs of up to a few thousands 
vertices on a modern desktop computer. We further illustrate the use of the algorithm on two applications in image 
processing, namely the matching of retina images based on vessel organization, and the matching of handwritten 
Chinese characters. 

The rest of the paper is organized as follows: Section [2] presents the mathematical formulation of the graph 
matching problem. In Section [31 we present our new approach. Then, in Section [H we present the comparison of 
our method with Umeyama's algorithm [13] and the linear programming approach [16] on the example of artificially 
simulated graphs. In Section [5l we test our algorithm on the QAP benchmark library, and we compare obtained 
results with the results published in [18] for the QBP algorithm and graduated assignment algorithms. Finally, in 
Section [6] we present two examples of applications to real- world image processing tasks. 

2 Problem description 

A graph G ~ (V, E) of size N is defined by a finite set of vertices = {1, . . . , TV} and a set of edges E (ZV xV . 
We consider only undirected graphs with no self-loop, i.e., such that £ E £ E and ^ E 

for any vertices i,j £ V. Each such graph can be equivalently represented by a symmetric adjacency matrix A 
of size \V\ X \V\, where Aij is equal to one if there is an edge between vertex i and vertex j, and zero otherwise. 
An interesting generalization is a weighted graph which is defined by association of nonnegative real values Wij 
(weights) to all edges of graph G. Such graphs arc represented by real valued adjacency matrices A with Aij = Wij. 
This generalization is important because in many applications the graphs of interest have associated weights for 
all their edges, and taking into account these weights may be crucial in construction of efficient methods. In the 
following, when we talk about "adjacency matrix" we mean a real-valued "weighted" adjacency matrix. 

Given two graphs G and H with the same number of vertices N, the problem of matching G and H consists 
in finding a correspondence between vertices of G and vertices of H which aligns G and H in some optimal way. 
We will consider in Section [3?8l an extension of the problem to graphs of different sizes. For graphs with the same 
size N, the correspondence between vertices is a permutation of N vertices, which can be defined by a permutation 
matrix P, i.e., a {0, l}-valued N x N matrix with exactly one entry 1 in each column and each row. The matrix P 
entirely defines the mapping between vertices of G and vertices of H, Pij being equal to 1 if the i-th vertex of G 
is matched to the j-th vertex of H, and otherwise. After applying the permutation defined by P to the vertices 

^The PATH algorithm as well as other referenced approximate methods are implemented in the software GraphM available at 
|http : //cblo . ensmp . f r/graphiii| 
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of H we obtain a new graph isomorphic to H which we denote by P{H). The adjacency matrix of the permuted 
graph, Ap(fl'), is simply obtained from Ah by the equahty Ap(^u) = PAhP^ ■ 

In order to assess whether a permutation P defines a good matching between the vertices of G and those of H , 
a quaUty criterion must be defined. Although other choices are possible, we focus in this paper on measuring the 
discrepancy between the graphs after matching, by the number of edges (in the case of weighted graphs, it will be 
the total weight of edges) which are present in one graph and not in the other. In terms of adjacency matrices, 
this number can be computed as: 

Fo(F) ^\\Ag- Ap^H)\?F ^Ug- PAhP'^WI , (1) 

where ||.||_f is the Frobenius matrix norm defined by = IyA^ A — - -^%)- ^ popular alternative to the 

Frobenius norm formulation ([1]) is the 1-norm formulation obtained by replacing the Frobenius norm by the 1-norm 
||A||i — TliiTlij l^y li which is equal to the square of the Frobenius norm ||^|||^ when comparing {0, l}-valued 
matrices, but may differ in the case of general matrices. 

Therefore, the problem of graph matching can be reformulated as the problem of minimizing Fq{P) over the set 
of permutation matrices. This problem has a combinatorial nature and there is no known polynomial algorithm to 
solve it [21]. It is therefore very hard to solve it in the case of large graphs, and numerous approximate methods 
have been developed. 

An interesting generalization of the graph matching problem is the problem of labeled graph matching. Here, 
each graph has associated labels to all its vertices and the objective is to find an alignment that fits well graph 
labels and graph structures at the same time. If we let Cij denote the cost of fitness between the i-th vertex of G 
and the j-th vertex of then the matching problem based only on label comparison can be formulated as follows: 

N N 

min tr(C^P)^^^a,P.,, (2) 

t=l 3 = 1 

where V denotes the set of permutation matrices. A natural way of unifying Q and ([1]) to match both the graph 
structure and the vertices' labels is then to minimize a convex combination [18]: 

min (1 - a)Fo{P) + atr(C^P), (3) 

that makes explicit, through the parameter a S [0,1], the trade-off between cost of individual matchings and 
faithfulness to the graph structure. A small a value emphasizes the matching of structures, while a large a value 
gives more importance to the matching of labels. 

2.1 Permutation matrices 

Before describing how we propose to solve ([1]) and we first introduce some notations and bring to notice some 
important characteristics of these optimization problems. They are defined on the set of permutation matrices, 
which we denoted by V. The set is a set of extreme points of the set of doubly stochastic matrices, that is, 
matrices with nonnegative entries and with row sums and column sums equal to one: T> = {A : Al^^ = 1^^, A^lpj = 
Iat, a > 0}, where l^r denotes the A^-dimensional vector of all ones [22]. This shows that when a linear function is 
minimized over the set of doubly stochastic matrices V, a solution can always be found in the set of permutation 
matrices. Consequently, minimizing a linear function over V is in fact equivalent to a linear program and can thus 
be solved in polynomial time by, e.g., interior point methods [23]. In fact, one of the most efficient methods to 
solve this problem is the Hungarian algorithm, which uses a specific primal-dual strategy to solve this problem in 
0{N^) [24]. Note that the Hungarian algorithm allows to avoid the generic 0{N'^) complexity associated with a 
linear program with N'^ variables. 

At the same time V may be considered as a subset of orthonormal matrices O = {A : A'^A ^ 1} oi T> and in 
fact P = V no. An (idealized) illustration of these sets is presented in Figure [1] the discrete set V of permutation 
matrices is the intersection of the convex set V of doubly stochastic matrices and the manifold O of orthogonal 
matrices. 

2.2 Approximate methods: existing works 

A good review of graph matching algorithms may be found in [25]. Here, we only present a brief description of 
some approximate methods which illustrate well ideas behind two subgroups of these algorithms. As mentioned in 
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Figure 1: Relation between three matrix sets. O — set of orthogonal matrices, V — set of doubly stochastic matrices, 
V = D r\ O — set of permutation matrices. 

the introduction, one popular approach to find approximate solutions to the graph matching problem is based on 
the spectral decomposition of the adjacency matrices of the graphs to be matched. In this approach, the singular 
value decompositions of the graph adjacency matrices are used: 

Ag = UgAgU^, Ah = UhArUJj, 

where the columns of the orthogonal matrices Ug and Uh consist of eigenvectors of Ag and Ah respectively, and 
Ag and Ah are diagonal matrices of eigenvalues. 

If we consider the rows of eigenvector matrices Ug and Uh as graph node coordinates in eigenspaces, then we 
can match the vertices with similar coordinates through a variety of methods [5, 13, 15]. However, these methods 
suffer from the fact that the spectral embedding of graph vertices is not uniquely defined. First, the unit norm 
eigenvectors are at most defined up to a sign flip and we have to choose their signs synchronously. Although it is 
possible to use some normalization convention, such as choosing the sign of each eigenvector in such a way that the 
biggest component is always positive, this usually does not guarantee a perfect sign synchronization, in particular 
in the presence of noise. Second, if the adjacency matrix has multiple eigenvalues, then the choice of eigenvectors 
becomes arbitrary within the corresponding eigen-subspace, as they are defined only up to rotations [26]. 

One of the first spectral approximate algorithms was presented by Umeyama [13]. To avoid the ambiguity 
of eigenvector selection, Umeyama proposed to consider the absolute values of eigenvectors. According to this 
approach, the correspondence between graph nodes is established by matching the rows of \Ug\ and \Uh\ (which 
are defined as matrices of absolute values). The criterion of optimal matching is the total distance between matched 
rows, leading to the optimization problem: 

min II \Ug\~P\Uh\ ||f, 

or equivalently: 

max tr(|[/a||[/GrP). (4) 

The optimization problem (jj]) is a linear program on the set of permutation matrices which can be solved by the 
Hungarian algorithm in 0{N^) [27,28]. 

The second group of approximate methods consists of algorithms which work directly with the objective function 
in ([T]), and typically involve various relaxations to optimizations problems that can be efficiently solved. An 
example of such an approach is the linear programming method proposed by Almohamad and Duffuaa in [16]. 
They considered the 1-norm as the matching criterion for a permutation matrix P V: 

F^{P) = \\Ag - PAhP^Wi = \\AgP - PAhWi. 

The linear program relaxation is obtained by optimizing Fq{P) on the set of doubly stochastic matrices 2? instead 
of -P: 

min (5) 

where the 1-norm of a matrix is defined as the sum of the absolute values of all the elements of a matrix. A priori 
the solution of ([5]) is an arbitrary doubly stochastic matrix X G V, so the final step is a projection of X on the set 
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of permutation matrices (we let denote Hj^X the projection of X onto V) : 

P* = U-pX = argmin \\P - X\\% , 

or equivalently: 

P* = argmaxX^P. (6) 

Per 

The projection ^ can be performed with the Hungarian algorithm, with a complexity cubic in the dimension 
of the problem. The main disadvantage of this method is that the dimensionality (i.e., number of variables and 
number of constraints) of the linear program ^ is 0{N'^), and therefore it is quite hard to process graphs of size 
more than one hundred nodes. 

Other convex relaxations of ([1]) can be found in [18] and [17]. In the next section wc describe our new algorithm 
which is based on the technique of convex-concave relaxations of the initial problems ([1]) and ^ . 



3 Convex-concave relaxation 



Let us start the description of our algorithm for unlabeled weighted graphs. The generalization to labeled weighted 
graphs is presented in Section 13.71 The graph matching criterion we consider for unlabeled graphs is the square 
of the Frobenius norm of the difference between adjacency matrices ([T]). Since permutation matrices are also 
orthogonal matrices (i.e., PP^ ~ I and P^ P ~ I), wc can rewrite Fq{P) on V as follows: 

FoiP) = \\Ag - PAhP^WI - \\{Ag - PAHP^)PfF 
= WAgP-PAhWI. 

The graph matching problem is then the problem of minimizing Fq{P) over V, which we call GM: 

GM: min Fo(P). (7) 

Pev 



3.1 Convex relaxation 

A first relaxation of GM is obtained by expanding the convex quadratic function F(){P) on the set of doubly 
stochastic matrices 2?: 

QCV: min Po(P). (8) 

Pev 

The QCV problem is a convex quadratic program that can be solved in polynomial time, e.g., by the Frank-Wolfe 
algorithm [29] (see Section [23] for more details). However, the optimal value is usually not an extreme points of V, 
and therefore not a permutation matrix. If wc want to use only QCV for the graph matching problem, we therefore 
have to project its solution on the set of permutation matrices, and to make, e.g., the following approximation: 

argnrin Po(-P) ~ Hp argniin Fq{P) . (9) 

Although the projection H-p can be made efficiently in 0{N^) by the Hungarian algorithm, the difficulty with this 
approach is that if argminx) Fo{P) is far from V then the quality of the approximation ([9|) may be poor: in other 
words, the work performed to optimize Fo(P) is partly lost by the projection step which is independent of the cost 
function. The PATH algorithm that we present later can be thought of as a improved projection step that takes 
into account the cost function in the projection. 



3.2 Concave relaxation 

We now present a second relaxation of GM, which results in a concave minimization problem. For that purpose, 
let us introduce the diagonal degree matrix D of an adjacency matrix A, which is the diagonal matrix with entries 
Da = d{i) = X^iLi ^ij for i = 1, . . . , iV, as well as the Laplacian matrix L = D — A. A having only nonnegative 
entries, it is well-known that the Laplacian matrix is positive semidefinite [30]. We can now rewrite Fq{P) as 
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follows: 

Fo(P) ^\\AgP - PAhWI 

= \\{DgP - PDh) - {LgP - PLhWf 

= \\DgP-PDh\\1 (10) 
- 2tv[{DGP - PDnfiLGP - PLh)] 
+ \\LgP-PLh\\1. 

Let us now consider more precisely the second term in this last expression: 

tr[(i^G^ - PDufiLGP - PLh)] 
= U-PP'^D'^Lg + ItLhDJjP'^P - IiP^D^PLh 

- trD^P'^LGP (11) 
. ' 

'l2dp{H)(i)dG(i) 
= Y,{dG{l) - dp(ff)(z))2 = \\Dg - Dp(^H)\\l 

= \\DgP-PDh\\1. 
Plugging (fTT|) into (fTU]) wc obtain 

F^{P) = \\DgP-PDh\\1-2\\DgP-PDh\\1 
+ \\LgP-PLh\\1 

= -\\DgP - PDhWI + \\LgP - PLhWI 

= - ^ P,,{Dg{]) - DH{i)f + tv{PP^LlLG) 

'J / (12) 

+ ti{LjiP'^PLH) -2 tr{P'^LlPLH) 

^ . ' 

vec(P)^(L|^ig)LS)vec(P) 

= -tr(AP) +tr(L^) +tr(L^) 

- 2ycc{Pf{Ljj ® ig)vcc(P) , 

where we introduced the matrix A^.j = {DH{j,j) — £'g(*,*))^ and we used (8) to denote the Kronecker product of 
two matrices (definition of the Kronecker product and some important properties may be found in the appendix 
11). 

Let us denote Fi{P) the part of which depends on P: 

Fi{P) = -tr(AP) - 2ycc{Pf{Lji ® ig)vcc(P). 

From (jl2p wc sec that the permutation matrix which minimizes Fi over V is the solution of the graph matching 
problem. Now. minimizing Fi{P) over V gives us a relaxation of ^ on the set of doubly stochastic matrices. Since 
graph Laplacian matrices are positive semi-dcfinitc, the matrix Lh®Lg is also positive semidcfinite as a Kronecker 
product of two symmetric positive semi-definite matrices [26] . Therefore the function Fi (P) is concave on 2?, and 
we obtain a concave relaxation of the graph matching problem: 

QCC: min Pi(P). (13) 

Interestingly, the global minimum of a concave function is necessarily located at a boundary of the convex set where 
it is minimized [31], so the minimium of Pi(P) on V is in fact in V. 

At this point, we have obtained two relaxations of GM as nonlinear optimization problems on D: the first one 
is the convex minimization problem QCV ([5]), which can be solved efficiently but leads to a solution in 2? that 
must then be projected onto V, and the other is the concave minimization problem QCC which does not have 
an efficient (polynomial) optimization algorithm but has the same solution as the initial problem GM. We note 
that these convex and concave relaxation of the graph matching problem can be more generally derived for any 
quadratic assignment problems [32]. 
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3.3 PATH algorithm 

We propose to approximately solve QCC by tracking a path of local minima over P of a series of functions that 
linearly interpolate between Fq{P) and Fi{P), namely: 

Fx{P) = {l-X)Fo{P) + XFi{P), 

for < A < 1. For all A G [0, 1], F\ is a quadratic function (which is in general neither convex nor concave for A 
away from zero or one). We recover the convex function Fq for A = 0, and the concave function Fi for A = 1. Our 
method searches sequentially local minima of Fx, where A moves from to 1. More precisely, we start at A = 0, and 
find the unique local minimum of Fq (which is in this case its unique global minimum) by any classical QP solver. 
Then, iteratively, we find a local minimum of Fa+^a given a local minimum of -Fa by performing a local optimization 
of F\^d\ starting from the local minimum of Fx, using for example the Frank- Wolfe algorithm. Repeating this 
iterative process for dX small enough we obtain a path of solutions P*(A), where P*{Q) ~ argminpgx) -Fb(P) and 
P*(A) is a local minimum of Fa for all A G [0, 1]. Noting that any local minimum of the concave function Fi on V 
is in V, we finally output F*(l) G 7-" as an approximate solution of GM. 

Our approach is similar to graduated non-convexity [33] : this approach is often used to approximate the global 
minimum of a non-convex objective function. This function consists of two part, the convex component, and 
non-convex component, and the graduated non-convexity framework proposes to track the linear combination of 
the convex and non-convex parts (from the convex relaxation to the true objective function) to approximate the 
minimum of the non-convex function. The PATH algorithm may indeed be considered as an example of such an 
approach. However, the main difference is the construction of the objective function. Unlike [33], we construct two 
relaxations of the initial optimization problem, which lead to the same value on the set of interest (V), the goal 
being to choose convex/concave relaxations which approximate in the best way the objective function on the set 
of permutation matrices. 

The pseudo-code for the PATH algorithm is presented in Figure [2] The rationale behind it is that among the 
local minima of Fi(F) on V, we expect the one connected to the global minimum of Fq through a path of local 
minima to be a good approximation of the global minima. Such a situation is for example shown in Figure [31 where 
in 1 dimension the global minimum of a concave quadratic function on an interval (among two candidate points) 
can be found by following the path of local minima connected to the unique global minimum of a convex function. 

1. Initialization: 

(a) A := 

(b) P* (0) ~ arg min Fq — convex optimization problem, global minimum is found by Frank- Wolfe algorithm. 

2. Cycle over A: 

while A < 1 

(a) Xneio := A + dA 

(b) if |Fa_(P*(A)) - Fa(P*(A))| < CA then 

^ ^new 

else P'^iXnew) — ^I'gi^iii^Aneu; found 

by Frank- Wolfe starting from P*(A) 

A — Xnew 

3. Output: P°"' := P*(l) 

Figure 2: Schema of the PATH algorithm 

More precisely, and although we do not have any formal result about the optimality of the PATH optimization 
method (beyond the lack of global optimality. sec Appendix]^, we can mention a few interesting properties of this 
method: 

• We know from p2)) that for P G P, Fi(P) = Fo(P) — k, where k = tr(FQ) + tr(F|^) is a constant independent 
of P. As a result, it holds for all A G [0, 1] that, for P G P: 

Fa(P) = Fo(P)-Ak. 
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Figure 3: Illustration for path optimization approach. Fq (A = 0) — initial convex function, Fi (A = 1) — initial 
concave function, bold black line — path of function minima P*(A) (A = . . . 0.1 . . . 0.2 . . . 0.3 . . . 0.75 ... 1) 

This shows that if for some A the global minimum of F\{P) over V lies in V, then this minimum is also 
the global minimum of Fq (P) over V and therefore the optimal solution of the initial problem. Hence, if for 
example the global minimum of F\ is found on V by the PATH algorithm (for instance, if Fa is still convex), 
then the PATH algorithm leads to the global optimum of Pi . This situation can be seen in the toy example 
in Figure [3] where, for A = 0.3, F\ has its unique minimum at the boundary of the domain. 

• The sub-optimality of the PATH algorithm comes from the fact that, when A increases, the number of local 
minima of Pa may increase and the sequence of local minima tracked by PATH may not be global minima. 
However we can expect the local minima followed by the PATH algorithm to be interesting approximations 
for the following reason. First observe that if Pi and P2 are two local minima of Pa for some A S [0, 1], 
then the restriction of Pa to (Pi,P2) being a quadratic function it has to be concave and Pi and P2 must 
be on the boundary of T>. Now, let Ai be the smallest A such that Pa has several local minima on V. If 
Pi denotes the local minima followed by the PATH algorithm, and P2 denotes the "new" local minimum of 
Pai, then necessarily the restriction of Pa^ to (Pi,P2) must be concave and have a vanishing derivative in 
P2 (otherwise, by continuity of Pa in A, there would be a local minimum of Pa near P2 for A slightly smaller 
than Ai). Consequently we necessarily have Paj(Pi) < Pai(P2). This situation is illustrated in Figure [3] 
where, when the second local minimum appears for A — 0.75, it is worse than the one tracked by the PATH 
algorithm. More generally, when "new" local minima appear, they are strictly worse than the one tracked 
by the PATH algorithm. Of course, they may become better than the PATH solution when A continues to 
increase. 

Of course, in spite of these justifications, the PATH algorithm only gives an approximation of the global minimum 
in the general case. In Appendix|3 we provide two simple examples when the PATH algorithm respectively succeeds 
and fails to find the global minimum of the graph matching problem. 

3.4 Numerical continuation method interpretation 

Our path following algorithm may be considered as a particular case of numerical continuation methods (sometimes 
called path following methods) [34]. These allow to estimate curves given in the following implicit form: 

T{u) = where T is a mapping T : R^+'^ ~> . (14) 

In fact, our PATH algorithm corresponds to a particular implementation of the so-called Generic Predictor Corrector 
Approach [34] widely used in numerical continuation methods. 

In our case, we have a set of problems minpgD (1 — A)Po(P) + APi(P) parametrized by A G [0, 1]. In other 
words for each A we have to solve the following system of Karush-Kuhn- Tucker (KKT) equations: 

(l-A)VpPo(P) + AVpPi(P) + P^:^ + /is = 0, 

BP-I2N = 0, 
Ps = 0, 
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where S" is a set of active constraints, i.e., of pairs of indices that satisfy Pij = 0, BP — I2N = codes the 
conditions J2j Pij = 1 Vi and Yli Pij = 1 ^ and are dual variables. We have to solve this system for all 
possible sets of active constraints S on the open set of matrices P that satisfy P^j > for (i, j) ^ in order to 
define the set of stationary points of the functions Fa- Now if we let T{P, v, /i, A) denote the left-hand part of the 
KKT equation system then we have exactly ((Ti)) with K = N'^ + 2N + . From the implicit function theorem [35], 
we know that for each set of constraints S, 



is a smooth 1-dimcnsional curve or the empty set and can be parametrized by A. In term of the objective function 
Fx{P), the condition on T' {P,v, ^g^\) may be interpreted as a prohibition for the projection of F\{P) on any 
feasible direction to be a constant. Therefore the whole set of stationary points of F\{P) when A is varying from 
to 1 may be represented as a union VF(A) = iJsWs{X) where each Ws{/^) is homotopic to a 1-dimcnsional segment. 
The set W{X) may have quite complicated form. Some of Ws{)^) may intersect each other, in this case we observe a 
bifurcation point, some of ^^^(A) may connect each other, in this case we have a transformation point of one path 
into another, some of ^^(A) may appear only for A > and/or disappear before A reaches 1. At the beginning the 
PATH algorithm starts from VF0(O), then it follows W0(A) until the border of V (or a bifurcation point). If such an 
event occurs before A = 1 then PATH moves to another segment of solutions corresponding to different constraints 
S, and keeps moving along segments and sometimes jumping between segments until A = 1. As we said in the 
previous section one of the interesting properties of PATH algorithm is the fact that if Wg{\) appears only when 
A = Ai and Wg{\i) is a local minimum then the value of the objective function in Wg{\i) is greater than in 
the point traced by the PATH algorithm. 

3.5 Some implementation details 

In this section we provide a few details relevant for the efficient implementation of the PATH algorithms. 

Frank- Wolfe Among the different optimization techniques for the optimization of Fx (P) starting from the current 
local minimum tracked by the PATH algorithm, we use in our experiments the Frank- Wolfe algorithm which is 
particularly suited to optimization over doubly stochastic matrices [36]. The idea of the this algorithm is to 
sequentially minimize linear approximations of Fq{P). Each step includes three operations: 

1. estimation of the gradient VF\{Pn), 

2. resolution of the linear program P* = 'Argmmp(=D {V Fx{Pn) , P) , 

3. line search: finding the minimum of F\{P) on the segment [P„ P*]. 

An important property of this method is that the second operation can be done efficiently by the Hungarian 
algorithm, in 0{N^). 

Efficient gradient computations Another essential point is that wc do not need to store matrices of size 
iV^ X iV^ for the computation of VFi(/'), because the tensor product in VFi{P) = -vcc(A'^) — 2{Ljj ® Lq)vcc(P) 
can be expressed in terms oi N x N matrix multiplication: 



Ws ={(P, ly, MS, A) : r(P, IV, /is, A) = and 

T'{P, v, fis, A) has the maximal possible rank} 



VPi(P) 



vec(A^) - 2{Ljj ® Lg)vec(P) 
vec(A^) - 2vcc{L^PLh). 



The same thing may be done for the gradient of the convex component 



VPo(P) = V[vec(P)^gvec(P)] 

where Q = {I Aq - Ajj (g, I)'^ {I ® Aq - A^ ® I) 
VFo{P) = 2Qvcc{P) 



= 2vcc{AlP - AlPA^H - AgPAh + PA]j) 
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Initialization The proposed algorithm can be accelerated by the application of Newton algorithm as the first 
step of QCV (minimization of Fq{P)). First, let us rewrite the QCV problem as follows: 



min \\AgP - PAhW p ^ 



min vec(P) Qvec(P) <^ 



'minp vcc{P)^ Qvec{P) 
such that 

BVCC(P) l2N 

vec(P) > Oa,2 



(15) 



where B is the matrix which codes the conditions Pi j = 1 and Pij = 1. The Lagrangian has the following 

j i 

form 

£(P, V, A) ^vcc{P f Qvec{P) + iy'^{Bvec{P) 
-l2Jv) + A*'^vec(F), 

where v and /i are Lagrange multipliers. Now we would like to use Newton method for constrained optimization [36] 
to solve Let fia denote the set of variables associated to the set of active constraints vec(P) = at the current 
points, then the Newton step consist in solving the following system of equations: 



(16) 



More precisely we have to solve ([T5)) for P. The problem is that in general situations this problem is computationally 
demanding because it involves the inversion of matrices of size 0{N'^) x 0{N'^). In some particular cases, however, 
the Newton step becomes feasible. Typically, if none of the constraints vec(P) > are active, then ([16]) takes the 
following fornjj: 



2Q 


B^ 


la 




" vcc(P) " 




" " 


N'^ elements. 


B 














1 


2N elements. 


la 

















# of act. ineq. cons. 



" 2Q 


B^ ■ 




■ vec(P) ■ 




■ " 


B 







ly 




1 



N'^ elements , 
2N elements . 



The solution is then obtained as follows: 

vec(P)_R-_R-T 



1 



Q-^B'^{BQ-'B')-'l 



-1 dTn-1i 



2N- 



(17) 



(18) 



Because of the particular form of matrices Q and B, the expression (|18[) may be computed very simply with the 
help of Kroncckcr product properties in 0{N^) instead of 0{N^). More precisely, the first step is the calculation 

of M = BQ-'^B^ where Q = {I ^ Aa 



Ajj I)^. The matrix Q ^ may be represented as follows: 



Q"' = (Uh «> Ug){I Ag - Aff ® ly^iUH ® Ucf 
Therefore the {i,j)-th element of M is the following product: 

B,Q-'Bj = vec((7^i^^[/G)^)(AG - Ah)'^ 

X veciU^Bj^Un) , 



(19) 



(20) 



where Bi is the i-th row of B and Bi is Bi reshaped into a, N x N matrix. The second step is an inversion of the 
2N X 2N matrix M, and a sum over columns AP ~ M~^12n- The last step is a multiplication of by B^ M'', 
which can be done with the same tricks as the first step. The result is the value of matrix Pkkt- We then have 
two possible scenarios: 



1. If Pkkt G 2^, then we have found the solution of 

2. Otherwise we take the point of intersection of the line (Pqj Pkkt) and the border dV as the next point and 
we continue with Frank- Wolfe algorithm. Unfortunately we can do the Newton step only once, then some of 
P > constraints become active and eflScient calculations are not feasible anymore. But even in this case the 
Newton step is generally very useful because it decreases a lot the value of the objective function. 



^It is true if we start our algorithm, for example, from the constant matrix Pq = -^Ijvl^. 
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dA-adaptation strategy In practice, we found it useful to have the parameter dX in the algorithm of Figure [5] 
vary between iterations. Intuitively, dX should depend on the form of the objective function as follows: if F^{P) 
is smooth and if increasing the parameter A does not change a lot the form of the function, then we can afford 
large steps, in contrast, we should do a lot of small steps in the situation where the objective function is very 
sensitive to changes in the parameter A. The adaptive scheme we propose is the following. First, we fix a constant 
dXrain = 10~^, which represents the lower limit for dX. When the PATH algorithm starts, dX is set to dXmin- 
If we see after an update Xnew = X + dX that \Fx„^^{P* {X}) — Fx{P*{X))\ < e\ then we double dX and keep 
multiplying dX by 2 as long as \Fx^^^{P* {X)) — F\{P*{X))\ < e\. On the contrary, if dX is too large in the sense 
that |Fa„,„(P*(A)) -Fa(F*(A))| >"el, then we divide dX by 2 until the criterion \Fx,,,^.iP*{X)) - Fx{P*{X))\ < ex 
is met, or dX = dXmin- Once the update on dX is done, we run the optimization (Frank- Wolfe) for the new value 
A + dX. The idea behind this simple adaptation schema is to choose dX which keeps |^A„e„ (^*('^)) ^ F\{P* {X))\ 
just below e\. 



Stopping criterion The choice of the update criterion \F\^^^{P* {X)) — F\{P*{X))\ is not unique. Here we check 
whether the function value has been changed a lot at the given point. But in fact it may be more interesting to 
trace the minimum of the objective function. To compare the new minimum with the current one, we need to check 
the distance between these minima and the difference between function values. It means that we use the following 
condition as the stopping criterion 

I^A„_(P*(A„e»)) - Fx{P*{X))\ < ef and 
\\P*{Xne^o)-P*{X)\\<e^ 

Although this approach takes a little bit more computations (we need to run Frank- Wolfe on each update of 
dA), it is quite efficient if we use the adaptation schema for dX. 

To fix the values and we use a parameter M which defines a ratio between these parameters and the 
parameters of the stopping criterion used in the Frank- Wolfe algorithm: ep^r (limit value of function decrement) 
and e|^i4/ (limit value of argument changing): = Me^y^ and = Me^y^. The parameter M represents an 
authorized level of stopping criterion relaxation when wc increment A. In practice, it means that when we start 
to increment A we may move away from the local minima and the extent of this move is defined by the parameter 
M . The larger the value of M , the further we can move away and the larger dX may be used. In other words, the 
parameter Al controls the width of the tube around the path of optimal solutions. 



3.6 Algorithm complexity 

Here we present the complexity of the algorithms discussed in the paper. 

• Umcyama's algorithm has three components: matrix multiplication, calculation of eigenvectors and applica- 
tion of the Hungarian algorithm for Complexity of each component is equal to 0{N^). Thus Umeyama's 
algorithm has complexity 0{N^). 



LP approach ([5|) has complexity 0{N'^) (worst case) because it may be rewritten as an linear optimization 
problem with 3 ^ variables [23] . 



In the PATH algorithm, there are three principal parameters which have a big impact on the algorithm complex- 
ity. These parameters are ef^^y, ^'^ ^"^^ first parameter epw defines the precision of the Frank- Wolfe 
algorithm, in some cases its speed may be sublinear [36], however it should work much better when the optimization 
polytope has a "smooth" border [36]. 

The influence of the ratio parameter M is more complicated. In practice, in order to ensure that the objective 
function takes values between and 1, wc usually use the normalized version of the objective function: 



Fnorra {F) 



\\AgP~PAh 



|2 



II^gIII^ + 11^^111, 

In this case if we use the simple stopping criterion based on the value of the objective function then the number 

c 



of iteration over A (number of Frank- Wolfe algorithm runs) is at least equal to -rJ^^ — where C 



min Fnorrn • 
V 



The most important thing is how the algorithm complexity depends on the graph size A^. In general the number 
of iterations of the Frank- Wolfe algorithm scales as O ( -p^ ) where k is the conditional number of the Hessian 
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matrix describing the objective function near a local minima [36]. It means that in terms of numbers of iterations, 
the parameter N is not crucial. N defines the dimensionality of the minimization problem, while k may be close to 
zero or one depending on the graph structures, not explicitly on their size. On the other hand, N has a big influence 
on the cost of one iteration. Indeed, in each iteration step we need to calculate the gradient and to minimize a 
linear function over the polytope of doubly stochastic matrices. The gradient estimation and the minimization may 
be done in 0{N^). In Section [4.21 we present the empirical results on how algorithm complexity and optimization 
precision depend on M (Figure Ub) and N (Figure [HI). 

3.7 Vertex pairwise similarities 

If we match two labeled graphs, then we may increase the performance of our method by using information on 
pairwise similarities between their nodes. In fact one method of image matching uses only this type of information, 
namely shape context matching [19]. To integrate the information on vertex similarities we use the approach 
proposed in but in our case we use F\{P) instead of Fo{P) 

nun F^{P) = nun (1 - a)Fx{P) + atr(C^P), . (21) 

The advantage of the last formulation is that F"{P) is just F\{P) with an additional linear term. Therefore we 
can use the same algorithm for the minimization of F"{P) as the one we presented for the minimization of F\{P). 

3.8 Matching graphs of different sizes 

Often in practice we have to match graphs of different sizes Nq and Nh (suppose, for example, that Nq > Nh)- 
In this case we have to match all vertices of graph iJ to a subset of vertices of graph G. In the usual case when 
Ng — Nh, the error ([1]) corresponds to the number of mismatched edges (edges which exist in one graph and do 
not exist in the other one). When we match graphs of different sizes the situation is a bit more complicated. Let 
Vq C Vg denote the set of vertices of graph G that are selected for matching to vertices of graph H, let Vq = Vg\Vq 
denote all the rest. Therefore all edges of the graph G are divided into four parts Eg = Eq^ U E^~ U E^'^ U Eq~ , 
where Eq~^ are edges between vertices from Vq , Eq~ are edges between vertices from Vq , E^~ and E'^~ are 
edges from Vq to Vq and from Vq to Vq respectively. For undirected graphs the sets Eq~ and Eq~ are the 
same (but, for directed graphs we do not consider, they would be different). The edges from Eq~, Eq^ and E^'^ 
are always mismatched and a question is whether we have to take them into account in the objective function or 
not. According to the answer we have three types of matching error (four for directed graphs) with interesting 
interpretations. 

1. We count only the number of mismatched edges between H and the chosen subgraph C G. It corresponds 
to the case when the matrix P from ([T|) is a matrix of size Ng x Nh and Ng — Nh rows of P contain only 
zeros. 

2. We count the number of mismatched edges between H and the chosen subgraph G^ C G. And we also count 
all edges from Eq~ , Eq~ and Eq^. In this case P from ([T]) is a matrix of size Ng x Ng- And we transform 
matrix Ah into a matrix of size of size Ng x Ng by adding Ng ~ Nh zero rows and zero columns. It means 
that we add dummy isolated vertices to the smallest graph, and then we match graphs of the same size. 

3. We count the number of mismatched edges between H and chosen subgraph G+ C G. And we also count all 
edges from E^^ (or Eq~^). It means that we count matching error between H and G^ and we count also the 
number of edges which connect G"*" and G~ . In other words we are looking for subgraph G^ which is similar 
to H and which is maximally isolated in the graph G. 

Each type of error may be useful according to context and interpretation, but a priori, it seems that the best 
choice is the second one where we add dummy nodes to the smallest graph. The main reason is the following. 
Suppose that graph H is quite sparse, and suppose that graph G has two candidate subgraphs Gf (also quite 
sparse) and Gj (dense). The upper bound for the matching error between H and Gf is ifVn + #Vq+, the lower 
bound for the matching error between H and Gj" is #Vq+ — #Vh. So if + #Vq+ < — #Vff then we 

will always choose the graph Gj" with the first strategy, even if it is not similar at all to the graph H. The main 
explanation of this effect lies in the fact that the algorithm tries to minimize the number of mismatched edges, and 
not to maximize the number of well matched edges. In contrast, when we use dummy nodes, we do not have this 
problem because if we take a very sparse subgraph G+ it increases the number of edges in G~ (the common number 
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of edges in G+ and G is constant and is equal to the number of edges in G) and finally it decreases the quality 
of matching. 

4 Simulations 

4.1 Synthetic examples 

In this section we compare the proposed algorithm with some classical methods on artificially generated graphs. 
Our choice of random graph types is based on [37] where authors discuss different types of random graphs which 
are the most frequently observed in various real world applications (world wide web, collaborations networks, 
social networks, etc.). Each type of random graphs is defined by the distribution function of node degree 
Prob(node degree = k) = VD{k). The vector of node degrees of each graph is supposed to be an i.i.d sample 
from VD[k). In our experiments we have used the following types of random graphs; 



Binomial law 


VD{k) = Gy{i~py-'' 


Geometric law 


VD{k) = (1 - e-^)e'^'= 


Power law 


VD{k) = Crk-^ 



The schema of graph generation is the following 

1. generate a sample d = {di, . . . , djv) from VD{k) 

2. if di is odd then goto step 1 

3. while Y.^ dt > 

(a) choose randomly two non-zero elements from d: dni and dn2 

(b) add edge (nl,n2) to the graph 

(c) dnl ^ d„i - 1 dn2 ^ dn2 - 1 

If we are interested in isomorphic graph matching then we compare just the initial graph and its randomly permuted 
copy. To test the matching of non-isomorphic graphs, we add randomly aNE edges to the initial graph and to its 
permitted copy, where Ne is the number of edges in the original graph, and a is the noise level. 

4.2 Results 

The first series of experiments are experiments on small size graphs (N=:8), here we are interested in comparison of 
the PATH algorithm (sec Figure [2]), the QCV approach ([8]), Umcyama spectral algorithm the linear program- 
ming approach ([5|) and exhaustive search which is feasible for the small size graphs. The algorithms were tested on 
the three types of random graphs (binomial, exponential and power). The results are presented in Figure |31 The 




noise level noise level noise level 

(a) bin (b) cxp (c) pow 



Figure 4: Matching error (mean value over sample of size 100) as a function of noise. Graph size N=8. U 
— Umeyama's algorithm, LP — linear programming algorithm, QCV — convex optimization, PATH — path 
minimization algorithm, OPT — an exhaustive search (the global minimum). The range of error bars is the standard 
deviation of matching errors 
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Figure 5: Matching error (mean value over sample of size 100) as a function of noise. Graph size N=20. U 
— Umeyama's algorithm, LP — linear programming algorithm, QCV — convex optimization, PATH — path 
minimization algorithm. 
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Figure 6: Matching error (mean value over sample of size 100) as a function of noise. Graph size N=100. U 
Umeyama's algorithm, QCV — convex optimization, PATH — path minimization algorithm. 



same experiment was repeated for middle-sized graphs (A^ = 20, Figure (5]) and for large graphs (A ~ 100, Figure 

ii). 

In all cases, the PATH algorithm works much better than all other approximate algorithms. There are some 
important things to note here. First, the choice of norm in ([T]) is not very important — results of QCV and LP 
are about the same. Second, following the solution paths is very useful compared to just minimizing the convex 
relaxation and projecting the solution on the set of permutation matrices (PATH algorithms works much better 
than QCV). Another noteworthy observation is that the performance of PATH is very close to the optimal solution 
when the later can be evaluated. 

We note that sometimes the matching error decreases as the noise level increases (e.g., in Figures EfcHt), which 
can be explained as follows. The matching error is upper bounded by the minimum of the total number of zeros 
in the adjacency matrices Aq and Ah^ so in general this upper bound deacreases when the edge density increases. 
When the noise level increases, it makes graphs denser, and consequently the upper bound of matching error 
decreases. The general behavior of graph matching algorithms as functions of the graph density is presented in 
Figure [7^). Here again the matching error decreases when the graph density becomes very large. 

The parameter M (see section 13. 5p defines how precisely the PATH algorithm tries to follow the path of local 
minimas. The larger M, the faster the PATH algorithm. At the extreme, when M is close to Xj^pv/. wc jump 
directly from the convex function (A = 0) to the concave one (A = 1). Figure [71d) shows in more details how 
algorithm speed and precision depend on M. 

Another important aspect to compare the different algorithms is their run-time complexity as a function of 
A. Figure [8] shows the time needed to obtain the matching between two graphs as a function of the number of 
vertices N (for A between 10 and 100), for the different methods. These curves are coherent with theoretical values 
of algorithm complexities summarized in Section 13.61 In particular we observe that Umeyama's algorithm is the 
fastest method, but that QCV and PATH have the same complexity in A. The LP method is competitive with 
QCV and PATH for small graphs, but has a worse complexity in A. 



14 




Figure 7: (a) Algorithm performance as a function of graph density, (b) Precision and speed of the PATH algorithm 
as a function of M, the relaxation constant used in the PATH algorithm (see section [XS]) . In both cases, graph 
size N=100, noise level cr=0.3; sample size is equal to 30. Error bars represent standard deviation of the matching 
error (not averaged) 




loglb(N) loglb(N) loglO(N) 



(a) bin (b) exp (c) pow 



Figure 8: Timing of U,LP,QCV and PATH algorithms as a function of graph size, for the different random graph 
models. LP slope « 6.7, U, QCV and PATH slope « 3.4 
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Tabic 1: Experiment results for QAPLIB benchmark datasets. 



QAP 


MIN 


PATH 


QPB 


GRAB 


U 


clirl2c 


11156 


18048 


20306 


19014 


40370 


chrl5a 


9896 


19086 


26132 


30370 


60986 


clirlSc 


9504 


16206 


29862 


23686 


76318 


clir20b 


2298 


5560 


6674 


6290 


10022 


clir22b 


6194 


8500 


9942 


9658 


13118 


cscl6b 


292 


300 


296 


298 


306 


roul2 


235528 


256320 


278834 


273438 


295752 


roul5 


354210 


391270 


381016 


457908 


480352 


rou20 


725522 


778284 


804676 


840120 


905246 


tailOa 


135028 


152534 


165364 


168096 


189852 


tail5a 


388214 


419224 


455778 


451164 


483596 


tail7a 


491812 


530978 


550852 


589814 


620964 


tai20a 


703482 


753712 


799790 


871480 


915144 


taiSOa 


1818146 


1903872 


1996442 


2077958 


2213846 


tai35a 


2422002 


2555110 


2720986 


2803456 


2925390 


tai40a 


3139370 


3281830 


3529402 


3668044 


3727478 



5 QAP benchmark library 

The problem of graph matching may be considered as a particular case of the quadratic assignment problem (QAP). 
The minimization of the loss function ([T]) is equivalent to the maximization of the following function: 

max ti{P'^A^PAH) ■ 

Therefore it is interesting to compare our method with other approximate methods proposed for QAP. [18] proposed 
the QPB algorithm for that purpose and tested it on matrices from the QAP benchmark library [38] , QPB results 
were compared to the results of graduated assignment algorithm GRAB [17] and Umeyama's algorithm. Results 
of PATH application to the same matrices are presented in Table [1] scores for QPB and graduated assignment 
algorithm are taken directly from the publication [18]. We observe that on 14 out of 16 benchmark, PATH is the 
best optimization method among the methods tested. 

6 Image processing 

In this section, we present two applications in image processing. The first one (Section 16. ip illustrates how taking 
into account information on graph structure may increase image alignment quality. The second one (Section 16. 2p 
shows that the structure of contour graphs may be very important in classification tasks. In both examples we 
compare the performance of our method with the shape context approach [19], a state-of-the-art method for image 
matching. 

6.1 Alignment of vessel images 

The first example is dedicated to the problem of image alignment. We consider two photos of vessels in human 
eyes. The original photos and the images of extracted vessel contours (obtained from the method of [39]) are 
presented in Figure [HI To align the vessel images, the shape context algorithm uses the context radial histograms 
of contour points (see [19]). In other words, according to the shape context algorithm one aligns points which have 
similar context histograms. The PATH algorithm uses also information about the graph structure. When we use 
the PATH algorithm we have to tune the parameter a ([2T|) . we tested several possible values and we took the one 
which produced the best result. To construct graph we use all points of vessel contours as graph nodes and we 
connect all nodes within a circle of radius r (in our case we use r = 50). Finally, to each edge («, j) we associate 
the weight Wij = exp(— [xi — yj\)- 

A graph matching algorithm produces an alignment of image contours, then to align two images we have to 
expand this alignment to the rest of image. For this purpose, we use a smooth spline-based transformation [40]. In 
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Figure 9: Eye photos (top) and vessel contour extraction (bottom). 



other words, we estimate parameters of the sphne transformation from the known ahgnment of contour points and 
then we apply this transformation to the whole image. Results of image matching based on shape context algorithm 
and on PATH algorithm are presented in Figure [101 where black lines designate connections between associated 
points. We observe that the context shape method creates many unwanted matching, while PATH produces a 
matching that visually corresponds to a correct alignment of the structure of vessels. The main reason why graph 
matching works better than shape context matching is the fact that shape context does not take into account the 
relational positions of matched points and may lead to totally incoherent graph structures. In contrast, graph 
matching tries to match pairs of nearest points in one image to pairs of nearest points in another one. 

Among graph matching methods, different results are obtained with different optimization algorithms. Table [2] 
shows the matching errors produced by different algorithms on this vessel alignment problem. The PATH algorithm 
has the smallest matching error, with the alignment shown on Figure fTOl QCV comes next, with an alignment 
that is also visually correct. On the other hand, the Umeyama algorithm has a much larger matching error, and 
visually fails to find a correct alignment, similar to the shape context method. 



Table 2: Alignment of vessel images, algorithm performance 



Method 


Shape context 


Umeyama 


QCV 


PATH 


matching error 


870.61 


764.83 


654.42 


625.75 



6.2 Recognition of handwritten Chinese characters 

Another example that we consider in this paper is the problem of Chinese character recognition from the ETL9B 
dataset [41]. The main idea is to use a score of optimal matching as a similarity measure between two images of 
characters. This similarity measure can be used then in machine learning algorithms, K-nearest neighbors (KNN) 
for instance, for character classification. Here we compare the performance of four methods: linear support vector 
machine (SVM), SVM with gaussian kernel, KNN based on score of shape context matching and KNN based on 
scores from graph matching which combines structural and shape context information. As a score, we use just the 
value of the objective function ([2T|) at the (locally) optimal point. We have selected three Chinese characters known 
to be difiicult to distinguish by automatic methods. Examples of these characters as well as examples of extracted 
graphs (obtained by thinning and uniformly subsampling the images) are presented in Figure 1111 For SVM based 
algorithms, we use directly the values of image pixels (so each image is represented by a binary vector), in graph 
matching algorithm we use binary adjacency matrices of extracted graphs and shape context matrices (see [19]). 
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Figure 10: Comparison of alignment based on shape context (top) and alignment based on the PATH optimization 
algorithm (bottom). For each algorithm we present two alignments: image '1' on image '2' and the inverse. Each 
alignment is a spline-based transformation (see text). 
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Our data set consist of 50 exemples (images) of each class. Each image is represented by 63 x 64 binary matrix. 
To compare different methods we use the cross vahdation error (five folds). The dependency of classification error 
from two algorithm parameters (a — coefficient of linear combination (j2ip and k — number of nearest neighbors 
used in KNN)) is shown in Figure [T^ 




CD 

.9 0.5 



^ 0.31 




I 0.iSK^^^ H^a=1 P 



0.2. 



(a) 



(b) 



Figure 12: (a) Classification error as a function of a. (b) Classification error as a function of k. Classification error 
is estimated as cross validation error (five folds, 50 repetition), the range of the error bars is the standard deviation 
of test error over one fold (not averaged over folds and repetition) 



Two extreme choices a = 1 and a — correspond respectively to pure shape context matching, i.e., when only 
node labels information is used, and pure unlabeled graph matching. It is worth observing here that KNN based 
just on the score of unlabeled graph matching does not work very well, the classification error being about 60%. 
An explanation of this phenomenon is the fact that learning patterns have very unstable graph structure within 
one class. The pure shape context method has a classification error of about 39%. The combination of shape 
context and graph structure informations allows to decrease the classification error down to 25%. Beside the PATH 
algorithm, we tested also the QCV algorithm and the Umeyama algorithm,the Umeyama algorithm almost does 
not decrease the classification error. The QCV algorithm works better than then Umeyama algorithm, but still 
worse than the PATH algorithm. Complete results can be found in Table [31 



Table 3: Classification of Chinese characters. {CV, STD) — mean and standard deviation of test error over cross- 
validation runs (five folds, 50 repetitions) 



Method 


CV 


STD 


Linear SVM 


0.377 


± 0.090 


SVM with gaussian kernel 


0.359 


± 0.076 


KNN (PATH) (a=l): shape context 


0.399 


± 0.081 


KNN (PATH) (a=0.4) 


0.248 


± 0.075 


KNN (PATH) (a=0): pure graph matching 


0.607 


± 0.072 


KNN (U) (a=0.9): a best choice 


0.382 


± 0.077 


KNN (QCV) (a=0.3): a best choice 


0.295 


± 0.061 



7 Conclusion 

We have presented the PATH algorithm, a new technique for graph matching based on convex-concave relaxations 
of the initial integer programming problem. PATH allows to integrate the alignment of graph structural elements 
with the matching of vertices with similar labels. Its results are competitive with state-of-the-art methods in several 
graph matching and QAP benchmark experiments. Moreover, PATH has a theoretical and empirical complexity 
competitive with the fastest available graph matching algorithms. 

Two points can be mentioned as interesting directions for further research. First, the quality of the convex- 
concave approximation is defined by the choice of convex and concave relaxation functions. Better performances 
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may be achieved by more appropriate choices of these functions. Second, another interesting point concerns the 
construction of a good concave relaxation for the problem of directed graph matching, i.e., for asymmetric adjacency 
matrix. Such generalizations would be interesting also as possible polynomial-time approximate solutions for the 
general QAP problem. 



A A toy example 

The PATH algorithm does not generally find the global optimum of the NP-complcte optimization problem. In 
this appendix we illustrate with two examples how the set of local optima tracked by PATH may or may not lead 
to the global optimum. 

More precisely, we consider two simple graphs with the following adjacency matrices: 





"0 


1 


l" 




"0 


1 


0' 


G = 


1 








and H = 


1 










1 




















Let C denote the cost matrix of vertex association 



0.1691 0.0364 1.0509 
0.6288 0.5879 0.8231 
0.8826 0.5483 0.6100 

Let us suppose that we have fixed the tradeoff a = 0.5, and that our objective is then to find the global minimum 
of the following function: 

Fo(P) = 0.5||GF-Pi/||| + 0.5tr(C"P), PcV. (22) 

As explained earlier, the main idea underlying the PATH algorithm is to try to follow the path of global minima of 
F\{P) pl]) . This may be possible if all global minima form a continuous path, which is not true in general. In 
the case of small graphs we can find the exact global minimum of F"(P) for all A. The trace of global minima as 
functions of A is presented in Figure [THT a) (i.e., we plot the values of the nine parameters of the doubly stochastic 
matrix, which are, as expected, all equal to zero or one when A ~ 1). When A is near 0.2 there is a jump of global 
minimum from one face to another. However if we change the linear term C to 



c 



0.4376 0.3827 0.1798 
0.3979 0.3520 0.2500 
0.1645 0.2653 0.5702 



then the trace becomes smooth (see FigurcfT^lb)) and the PATH algorithm then finds the globally optimum point. 
Characterizing cases where the path is indeed smooth is the subject of ongoing research. 




0.5 

(a) 




0.5 

(b) 



Figure 13: Nine coordinates of global minimum of as a function of A 
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B Kronecker product 

The Kronecker product of two matrices A(E) B is defined as follows: 

Bail ■ ■ ■ Bain 
A®B= ■■ : 

_Bami ■ ■ ■ Bamn 

Two important properties of Kronecker product that we use in this paper are: 

(A^ ® S)vec(X) = Yec{BXA), 
and tiiX^AXB^) = vec{Xf{B ® A)vec(X) . 
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